class: center, middle, inverse, title-slide # Feature-based Time Series Forecasting ### Thiyanga S. Talagala ### Forecasting for Social Good (F4SG): Democratising Forecasting --- <style> .center2 { margin: 0; position: absolute; top: 50%; left: 50%; -ms-transform: translate(-50%, -50%); transform: translate(-50%, -50%); } </style> <style type="text/css"> .remark-slide-content { font-size: 30px; } </style> ### Content - Time series features - feasts: *F*eature *E*xtraction *A*nd *S*tatistics for *T*ime *S*eries - Feature-based time series forecasting - Other applications - Data visualization - Anomaly detection - What did we learn? - Where to go from here? --- background-image: url(img/jhu.png) background-size: contain --- class: inverse, center, middle background-image: url(img/jhu.png) background-position: 50% 60%1 background-size: contain
Let's visualize the coronavirus pandemic!
--- background-image: url(img/coronavirus.png) background-size: 90px background-position: 100% 6% # Data: coronavirus package ```r install.packages("coronavirus") # devtools::install_github("RamiKrispin/coronavirus") ``` ```r library(coronavirus) head(coronavirus, 8) ``` ``` date province country lat long type cases 1 2020-01-22 Afghanistan 33.93911 67.70995 confirmed 0 2 2020-01-22 Albania 41.15330 20.16830 confirmed 0 3 2020-01-22 Algeria 28.03390 1.65960 confirmed 0 4 2020-01-22 Andorra 42.50630 1.52180 confirmed 0 5 2020-01-22 Angola -11.20270 17.87390 confirmed 0 6 2020-01-22 Antigua and Barbuda 17.06080 -61.79640 confirmed 0 7 2020-01-22 Argentina -38.41610 -63.61670 confirmed 0 8 2020-01-22 Armenia 40.06910 45.03820 confirmed 0 ``` ---
--- class: center, middle # What problems do you see in the plot? --- background-image: url(img/tukey.jpeg) background-size: 200px background-position: 100% 6% # Time series features - **Cognostics**: **Co**mputer-aided dia**gnostics** (John W. Tukey, 1985) - Characteristics of time series - Summary measures of time series **Basic Principle** - Transform a given time series `\(y=\{y_1, y_2, \cdots, y_n\}\)` into a feature vector `\(F = (f_1(y), f_2(y), \cdots, f_p(y))'\)`. --- ## Examples of time series features .pull-left[ - length - strength of trend - strength of seasonality - lag-1 autocorrelation - spectral entropy - proportion of zeros - spikiness ] .pull-right[ - curvature - linearity - stability - number of peaks - parameter estimates of Holt-Winters' additive method - unit root test statistics ] --- .pull-left[ #### Time-domain representation <!-- --> ] -- .pull-right[ #### Feature-domain representation <!-- --> ] --- ### Time-domain representation .pull-left[ ``` $N0001 Time Series: Start = 1975 End = 1988 Frequency = 1 [1] 940.66 1084.86 1244.98 1445.02 1683.17 2038.15 2342.52 2602.45 2927.87 [10] 3103.96 3360.27 3807.63 4387.88 4936.99 $N0633 Time Series: Start = 1974 End = 1989 Frequency = 1 [1] 6664 7719 6750 6638 6666 6497 5867 6895 6077 6796 7488 7285 8014 8731 7650 [16] 7200 $N0625 Time Series: Start = 1973 End = 1989 Frequency = 1 [1] 1570 2270 2246 2072 2024 2120 2522 2560 2592 2576 2916 2874 3398 3580 3770 [16] 4222 4486 $N0645 Time Series: Start = 1955 End = 1986 Frequency = 1 [1] 6030 5070 5970 7870 5490 7600 5620 5040 6140 5410 8880 8130 6850 6990 6180 [16] 6310 5080 7400 5790 6682 6582 4167 7165 7426 7290 6900 7459 7003 6226 7453 [31] 5009 6115 $N1912 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 1981 3959 3704 5149 5419 5151 6368 6427 5724 4809 4596 4454 4488 1982 3696 4103 5299 5156 5699 6326 5732 6029 5193 4888 4482 4417 1983 4539 3470 5358 5099 5979 6631 6163 6653 5648 4732 4429 3916 1984 4117 4636 4639 5033 5431 6100 6499 6716 4880 5206 4421 3897 1985 4479 4102 4301 5606 5762 5998 6279 5893 4825 4870 4088 3882 1986 4303 4065 4861 6173 5884 5856 5899 5293 4687 4796 3988 4080 1987 4048 4174 5353 5913 6196 6471 5950 5962 5203 4896 4085 4183 1988 3978 4371 5325 5928 5800 6568 6115 6202 5340 4983 4410 4531 1989 4630 4416 5290 5998 6153 6359 5826 6184 4989 5126 4668 4174 1990 4215 4284 5342 5109 6026 5835 5835 5876 4757 4920 4176 3886 1991 3987 3750 4591 5303 5822 5663 $N2012 Jan Feb Mar Apr May Jun Jul Aug Sep Oct 1979 2317.0 2265.0 2842.0 2485.0 2904.0 2781.5 2587.0 2734.5 2389.5 2657.0 1980 2273.0 2400.5 2582.5 2733.5 2951.0 3036.0 2871.0 2939.5 3004.0 3050.5 1981 2414.0 2564.5 3128.5 3303.5 3274.0 3609.0 3262.0 3309.0 3263.0 2998.5 1982 2413.5 2525.0 3055.5 3138.5 3294.5 3650.0 3131.0 3328.5 3242.5 2702.0 1983 2456.5 2594.0 3283.0 3384.0 3642.5 4043.0 3460.5 3771.5 3568.0 3284.5 1984 3122.5 3362.0 3803.5 3771.5 4196.0 4199.5 3925.5 4165.0 3722.0 3826.5 1985 3152.5 3106.0 3771.5 4326.5 4650.5 4402.5 4311.0 4321.5 4029.5 4102.0 1986 3493.0 3405.5 3800.0 4519.5 4569.0 4455.0 4313.5 4281.5 4207.5 4317.0 1987 3515.5 3808.0 4290.5 4566.0 4632.5 4719.0 4592.0 4508.0 4448.5 4524.0 1988 3621.5 3966.0 4633.5 4698.0 5012.5 5179.0 4556.5 4854.0 4663.0 4540.5 1989 3986.0 4086.0 4626.5 4789.0 5208.5 5301.0 Nov Dec 1979 2177.0 1870.0 1980 2381.5 2264.5 1981 2448.5 2202.0 1982 2378.0 2081.0 1983 2900.5 2529.5 1984 3165.0 2832.5 1985 3368.0 2855.0 1986 3300.5 3071.5 1987 3790.0 3438.0 1988 4098.5 3757.5 1989 ``` ] .pull-right[ <!-- --> ] --- ### Feature-domain representation .pull-left[ ``` # A tibble: 6 × 3 trend seasonality id <dbl> <dbl> <chr> 1 0.995 0 N0001 2 0.591 0 N0633 3 0.961 0 N0625 4 0.178 0 N0645 5 0.251 0.906 N1912 6 0.968 0.927 N2012 ``` ] .pull-right[ <!-- --> ] --- ## Features for all countries ``` # A tibble: 193 × 25 country trend_strength seasonal_strengt… seasonal_peak_we… seasonal_trough… <chr> <dbl> <dbl> <dbl> <dbl> 1 Afghanis… 0.870 0.260 6 5 2 Albania 0.985 0.493 2 6 3 Algeria 0.991 0.308 2 4 4 Andorra 0.684 0.656 6 5 5 Angola 0.914 0.380 1 6 6 Antigua … 0.437 0.169 2 5 7 Argentina 0.980 0.774 2 5 8 Armenia 0.982 0.876 2 6 9 Australia 0.833 0.271 1 3 10 Austria 0.984 0.712 2 6 # … with 183 more rows, and 20 more variables: spikiness <dbl>, # linearity <dbl>, curvature <dbl>, stl_e_acf1 <dbl>, stl_e_acf10 <dbl>, # acf1 <dbl>, acf10 <dbl>, diff1_acf1 <dbl>, diff1_acf10 <dbl>, # diff2_acf1 <dbl>, diff2_acf10 <dbl>, season_acf1 <dbl>, pacf5 <dbl>, # diff1_pacf5 <dbl>, diff2_pacf5 <dbl>, season_pacf <dbl>, # zero_run_mean <dbl>, nonzero_squared_cv <dbl>, zero_start_prop <dbl>, # zero_end_prop <dbl> ``` --- .pull-left[ # Time-domain <!-- --> ] .pull-right[ # Feature-space
] --- ## Strength of trend .pull-left[ <!-- --> ] .pull-right[ `$$y_t = T_t + S_t + R_t$$` `$$F_T = max \left(0, 1 - \frac{Var(R_t)}{Var(T_t + R_t)} \right)$$` ] --- ## Strength of seasonality .pull-left[ <!-- --> ] .pull-right[ `$$y_t = T_t + S_t + R_t$$` `$$F_S = max \left(0, 1 - \frac{Var(R_t)}{Var(S_t + R_t)} \right)$$` ] --- ## *F*eature *E*xtraction *A*nd *S*tatistics for *T*ime *S*eries: *feasts* .pull-left[ You could install the stable version from CRAN: ```r install.packages("feasts") ``` You can install the development version from GitHub with: ```r # install.packages("remotes") remotes::install_github("tidyverts/feasts") ``` ] .pull-right[  ] --- class: center, middle # Large-scale time series forecasting --- class: inverse background-image: url(img/rice1.png) background-size: contain --- class: inverse background-image: url(img/rice2.png) background-size: contain --- class: center, middle # Feature-based time series forecasting --- class: inverse, center, middle background-image: url(img/f1.png) background-size: contain --- class: inverse, center, middle background-image: url(img/f2.png) background-size: contain --- class: inverse, center, middle background-image: url(img/f3.png) background-size: contain --- class: inverse, center, middle background-image: url(img/f4.png) background-size: contain --- class: inverse, center, middle background-image: url(img/f5.png) background-size: contain --- class: inverse, center, middle background-image: url(img/f6.png) background-size: contain --- class: inverse, center, middle background-image: url(img/f7.png) background-size: contain --- class: inverse, center, middle background-image: url(img/f8.png) background-size: contain --- class: inverse, center, middle background-image: url(img/f9.png) background-size: contain --- class: inverse, center, middle background-image: url(img/f10.png) background-size: contain --- class: inverse, center, middle background-image: url(img/f11.png) background-size: contain --- class: inverse, center, middle background-image: url(img/f12.png) background-size: contain --- class: inverse, center, middle background-image: url(img/f13.png) background-size: contain --- ## Machine learning algorithms - Random forest - XGBoost --- background-image: url(img/forest.jpg) background-size: cover # Random forest --- background-image: url(img/rf1.png) background-size: contain ## Random forest --- background-image: url(img/rf2.png) background-size: contain ## Random forest --- background-image: url(img/rf3.png) background-size: contain ## Random forest --- background-image: url(img/rf4.png) background-size: contain ## Random forest --- background-image: url(img/rf5.png) background-size: contain --- background-image: url(img/rf6.png) background-size: contain --- background-image: url(img/rf7.png) background-size: contain --- background-image: url(img/rf8.png) background-size: contain --- background-image: url(img/rf9.png) background-size: contain --- background-image: url(img/rf10.png) background-size: contain --- background-image: url(img/rf11.png) background-size: contain --- background-image: url(img/rf12.png) background-size: contain --- background-image: url(img/rf13.png) background-size: contain